Interlinking RDF Data in Different Languages

نویسنده

  • Tatiana Lesnikova
چکیده

The Semantic Web provides technologies such as the Resource Description Framework (RDF) for representing data on the web. The number of data sets published in RDF is growing rapidly. To connect a variety of data across RDF data sets, they should be interlinked. However, resources can be described in different natural languages. Such publishers as the French National Library [1], the Spanish National Library [2], the National British Museum make their data available using RDF model in their own language. There also exist encyclopedias in RDF: DBpedia in multilingual versions is a structured information from Wikipedia; XLore database [3] is an effort to publish the Chinese encyclopedias (Baidu Baike and Hudong Baike) in RDF. The Europeana Project aims at bringing together descriptions of cultural artifacts from European cultural institutions. This is done by harvesting the metadata of its data providers. The descriptions of these artifacts can be in different languages. The importance of tackling multilingualism in the semantic web has been highlighted in [4]. Problem description. Given two RDF data sets with resources described in different languages, the same entity represented in different data sets has to be identified. At the instance level, the values of properties are in different languages, which makes it harder to merge data about the same entity from different sources. The goal of our research is to identify the same entities across multilingual RDF data sets and link them by owl:sameAs links. For this purpose, we are developing an approach which represents RDF entities as text documents and then compare them. We apply standard Natural Language Processing (NLP) techniques (document preprocessing, term weights, similarity measures) on our data. We particularly explore two strategies presented below.

برای دانلود رایگان متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

Interlinking Cross-Lingual RDF Data Sets

Linked Open Data is an essential part of the Semantic Web. More and more data sets are published in natural languages comprising not only English but other languages as well. It becomes necessary to link the same entities distributed across different RDF data sets. This paper is an initial outline of the research to be conducted on cross-lingual RDF data set interlinking, and it presents severa...

متن کامل

Liage de données RDF : évaluation d'approches interlingues. (RDF Data Interlinking : evaluation of Cross-lingual Methods)

The Semantic Web extends the Web by publishing structured and interlinked data using RDF. An RDF data set is a graph where resources are nodes labelled in natural languages. One of the key challenges of linked data is to be able to discover links across RDF data sets. Given two data sets, equivalent resources should be identified and linked by owl:sameAs links. This problem is particularly diff...

متن کامل

NLP for Interlinking Multilingual LOD

Nowadays, there are many natural languages on the Web, and we can expect that they will stay there even with the development of the Semantic Web. Though the RDF model enables structuring information in a unified way, the resources can be described using different natural languages. To find information about the same resource across different languages, we need to link identical resources togeth...

متن کامل

Cross-lingual RDF Thesauri Interlinking

Various lexical resources are being published in RDF. To enhance the usability of these resources, identical resources in different data sets should be linked. If lexical resources are described in different natural languages, then techniques to deal with multilinguality are required for interlinking. In this paper, we evaluate machine translation for interlinking concepts, i.e., generic entiti...

متن کامل

Interlinking English and Chinese RDF Data Sets Using Machine Translation

Data interlinking is a difficult task particularly in a multilingual environment like the Web. In this paper, we evaluate the suitability of a Machine Translation approach to interlink RDF resources described in English and Chinese languages. We represent resources as text documents, and a similarity between documents is taken for similarity between resources. Documents are represented as vecto...

متن کامل

ذخیره در منابع من


  با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

عنوان ژورنال:

دوره   شماره 

صفحات  -

تاریخ انتشار 2014